Semi-Supervised Cause Identification from Aviation Safety Reports
نویسندگان
چکیده
We introduce cause identification, a new problem involving classification of incident reports in the aviation domain. Specifically, given a set of pre-defined causes, a cause identification system seeks to identify all and only those causes that can explain why the aviation incident described in a given report occurred. The difficulty of cause identification stems in part from the fact that it is a multi-class, multilabel categorization task, and in part from the skewness of the class distributions and the scarcity of annotated reports. To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data. Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data.
منابع مشابه
Cause Identification from Aviation Safety Incident Reports via Weakly Supervised Semantic Lexicon Construction
The Aviation Safety Reporting System collects voluntarily submitted reports on aviation safety incidents to facilitate research work aiming to reduce such incidents. To effectively reduce these incidents, it is vital to accurately identify why these incidents occurred. More precisely, given a set of possible causes, or shaping factors, this task of cause identification involves identifying all ...
متن کاملApplication of diffusion maps to identify human factors of self-reported anomalies in aviation.
A study investigating what factors are present leading to pilots submitting voluntary anomaly reports regarding their flight performance was conducted. Diffusion Maps (DM) were selected as the method of choice for performing dimensionality reduction on text records for this study. Diffusion Maps have seen successful use in other domains such as image classification and pattern recognition. High...
متن کاملImproving Performance of Classification Models with Textual Data
The main objective in this study is to measure the effect of unstructured text on classification performance. A large dataset of aviation incidents reports was used in this study. In aviation incidents the proportion attributable to human factors is close to 90%. Therefore accurate identification of the presence of human factors in past aviation incidents is critical to improving aviation safet...
متن کاملLearning Cause Identifiers from Annotator Rationales
In the aviation safety research domain, cause identification refers to the task of identifying the possible causes responsible for the incident described in an aviation safety incident report. This task presents a number of challenges, including the scarcity of labeled data and the difficulties in finding the relevant portions of the text. We investigate the use of annotator rationales to overc...
متن کاملOn the Over-Emphasis of Human ‘Error’ As A Cause of Aviation Accidents: ‘Systemic Failures’ and ‘Human Error’ in US NTSB and Canadian TSB Aviation Reports 1996-2003
It has been claimed that up to 80% of all aviation accidents are attributed to human ‘error’ (Johnson, 2003). This has important consequences as national and international initiatives focus on the reduction of operator ‘error’, for instance by increasing the levels of automation in Air Traffic Management. However, it is difficult to validate claims about the frequency of human ‘error’. This pap...
متن کامل